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ABSTRACT 

We revisit the issue of non-parametric gravitational lens reconstruction and present a 
new method to obtain the cluster mass distribution using strong lensing data without 
using any prior information on the underlying mass. The method relies on the de- 
composition of the lens plane into individual cells. We show how the problem in this 
approximation can be expressed as a system of linear equations for which a solution 
can be found. Moreover, we propose to include information about the null space. That 
is, make use of the pixels where we know there are no arcs above the sky noise. The 
only prior information is an estimation of the physical size of the sources. No priors 
on the luminosity of the cluster or shape of the halos are needed thus making the 
results very robust. In order to test the accuracy and bias of the method we make use 
of simulated strong lensing data. We find that the method reproduces accurately both 
the lens mass and source positions and provide error estimates. 

Key words: galaxies:clusters:general; mcthods:data analysis; dark matter 



1 INTRODUCTION 

Analysis of strong lensing images in galaxy clusters is one 
of the more subjective and intuitive fields of modern astron- 
omy (see Blandford & Narayan 1992, Schneider, Ehlers & 
Falco 1993, Wambsganss 1998, Narayan & Bartelmann 1999, 
Kneib 2002, for comprehensive reviews on gravitational lens- 
ing). The common use of parametric models means making 
educated choices about the cluster mass distribution, for 
instance that the dark matter haloes follow the luminous 
matter in the cluster or that galaxy profiles posses certain 
symmetries. Once these choices are made, one can only hope 
they are the right ones. The recent improvement in strong 
lensing data however, is impressive and should increasingly 
allow us to extract information using fewer assumptions. For 
instance, it should allow us to relax assumptions about the 
mass distributions of the cluster and instead enable us to 
test this. The approach of testing rather than assuming the 
underlying physics has now been used in other branches of 
astrophysics and cosmology (e.g. Tegmark 2002) but has so 
far been largely absent in strong lensing analyses. 

There are two relatively different fields within the area 
of strong lensing, namely lensing by galaxies and galaxy 
clusters. These are different both in appearance as well as 
in abundance. Galaxy-cluster systems are few and far be- 
tween, only the most abnormally dense clusters have surface 
densities greater than the critical density for lensing. How- 



ever, they are much more spectacular than their galaxy lens 
counterparts. The most massive clusters are able to create 
multiple images of extended objects such as distant galax- 
ies with image separations of up to ~ 1 arcminute. Arguably 
the most impressive system, A1689 (Broadhurst et al. 2005), 
boasts more than 100 multiple images of some 30-f sources. 
Galaxy lens systems are of course on a much smaller scale 
with image separations of a few arcseconds. They are far 
more abundant, but less impressive. The lensed objects that 
we are able to observe are typically high redshift quasars 
due to their high luminosity and point like structure, al- 
though galaxy-galaxy lensing has been observed ( Brainerd, 
Blandford & Small 1996, Hudson et al. 1998, Guzik & Seljak 
2002). 

The classification of strong lensing systems into these 
two groups is not merely a matter of scale. The baryons in 
galaxies have had time to cool and form the visible galaxy, 
thereby giving a cuspy profile, suitable for lensing. The cool- 
ing time for galaxy clusters exceed the Hubble time and the 
density profile is therefore far less cuspy, making them less 
ideal lenses. Although cluster lens systems are scarce, the im- 
pressive number of lensed images in each system, mean they 
still contain a lot of information, particularly regarding clus- 
ter mass profiles. The information is harder to extract due 
to the relatively complicated gravitational potentials, but if 
this challenge can be overcome they could be highly useful 
probes, certainly of cluster physics and potentially also for 
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cosmology (Yamamoto & Futamasc 2001, Yamamoto et al. 

2001, Chiba & Takahashi 2002, Golse ct al. 2002, Sereno 

2002, Meneghetti et al 2005) The intention of this paper is 
to improve our methods for extracting this information. 

Although alternative approaches have been suggested to 
recover the density field in both the weak and strong lensing 
regime, (see for instance Kaiser & Squires 1993, Broadhurst 
et al. 1995, Kaiser 1995, Schneider 1995, Schneider & Seitz 

1995, Seitz & Schneider 1995, Bartelmann et al. 1996, Taylor 
et al. 1998, Tyson et al. 1998, Bridle et al. 1998, Marshall et 
al. 1998), the standard approach to modeling strong cluster 
lenses is using parametric methods. This is motivated by the 
fact that the data usually do not contain more than a few 
arcs. This is not enough to constrain the mass distribution 
without the help of a paxametrization. Parametric methods 
rely heavily on assumptions or priors on the mass distribu- 
tion (Kochanek & Blandford 1991, Kneib et al. 1993, 1995, 

1996, 2003, CoUey et al. 1996, Tyson et al. 1998, Broad- 
hurst et al. 2000, 2005, Sand et al. 2002). A common prior 
is the assumption that there is a smooth dark matter com- 
ponent which is correlated spatially with the centroid of the 
luminous matter in the cluster. The mass is then ususally 
modeled by a large smooth dark matter halo placed on top 
of the central galaxy or the centroid of the luminous matter, 
as well as smaller dark matter haloes located in the positions 
of the other luminous galaxies. The parameters of each halo 
are then adjusted to best reproduce the observations. 

There is plenty of subjectivity involved in this process, 
particularly in the addition of the dominant dark matter 
component to the cluster. The assumption that the dark 
matter follows the luminosity is necessary but remains the 
Achilles heel of parametric lens modelling. For large clus- 
ters the number of parameters in the parametric lens model 
quickly becomes large but there is still no guarantee that the 
parametric model used, is in fact capable of reproducing well 
the mass distribution. It is not hard to envisage complica- 
tions like dark matter substructure, asymmetric galaxy pro- 
files, interactions between individual galaxies and the cluster 
or even dark matter haloes without significant luminosity all 
of which would not be well represented by the typical para- 
metric methods. In these cases, where the number of pa- 
rameters is large, we may want to consider alternative non- 
parametric methods where all the previous problems do not 
have any effect on any of the assumptions. Also is in these 
situations where the number of parameters in both para- 
metric and non-parametric methods is comparable. When 
the number of parameters is comparable in both cases, it is 
interesing to explore non-parametric methods since they do 
not rely on the same assumptions. 

This paper does not pretend to be an attack to paramet- 
ric methods but a defense of non-parametric alternatives. 
Parametric methods are usually the best way to obtain in- 
formation about the gravitational potential (few multiple 
images, simple lens) and are not affeted by resolution prob- 
lems, specially in the center of the lens which can play a 
key role in reproducing the radial arcs. They however suffer 
of some potential problems, some of which have been out- 
lined above. These problems combined with the impressive 
quality of recent and upcoming data lead us in this paper 
to explore the potential to constrain the mass distribution 
without imposing priors on it. In other words we want to 
know what strong gravitational lensing images can tell us 



about cluster mass profiles whilst pretending we know noth- 
ing about the luminosity. Upcoming images of strong lens- 
ing in galaxy clusters will contain of the order of a hundred 
arcs and should make this a manageable task (Diego et al. 
2004). A non-parametric approach will provide an impor- 
tant consistency check, since concurring results would lend 
strength to the parametric approach, whereas any resulting 
differences would need to be addressed. 

Accepting this challenge we present in this paper a new 
method which makes use of all the available information in 
super-high quality strong lensing systems. The method has 
also been thoroughly tested with simulated lensing data with 
very good results. We thus address the crucial issue of how 
well we can reproduce both the cluster mass profile and the 
positions and shapes of the background galaxies. 

We want to stress that ours is by no means the first work 
proposing use of non-parametric methods in strong lensing. 
Among the non-parametric methods already available, are 
the pixellization methods of Saha et al. (1997, 2000), Ab- 
delsalam et al. (1998a, 1998b), WiUiams et al. (2001), who 
first established many of the ideas revisited in this work, as 
and also the multi-pole approach of Kochanek & Blandford 
(1991), Trotter et al. (2000). 

Naturally, our work shares many similarities with this 
former work, but there are also important and interesting 
differences. Firstly, in Saha et al. (1997) (and subsequent 
papers) , the authors divide the lens plane into a grid similar 
to the method presented here, but with the important dif- 
ference that their grid is fixed while our grid is dynamical. 
Our grid adapts to the new estimated mass at each step. 
This has important implications for the solution since high 
density regions will be sampled more heavily. These dense 
regions play a key role, particularly in the positions of the 
radial arcs. Not sampling these regions properly will neces- 
sarily lead to a biased mass distribution. 

Another important difference with Saha et al (1997) is 
that they make use of a prior on the mass distribution, pe- 
nalizing deviations from the distribution of luminous matter. 
They claim this prior does not play an important role but 
we find this questionable. It is hard to quantify the effect 
since the method apparently was not tested on simulated 
lensing data. In contrast, as mentioned above, our method 
is thoroughly tested with simulations to quantify how well 
the mass distribution is recovered. We do not use any prior 
other than a physical prior on the sizes of the sources. This 
prior is proved to be weak provided it is chosen with a min- 
imum of wisdom. 

A third important difference between the work pre- 
sented in this paper and others is that we show how to speed 
up the algorithm significantly by adopting techniques com- 
monly used in optimization problems. 

Moreover as an added novelty, our algorithm also in- 
cludes for the first time information about the null space. 
Rather than using only the information in the lensed arcs, 
we use information which has hitherto been overlooked; the 
areas in the sky where no arcs are observed. 

The paper is laid out as follows. In the next section 
we present the lens inversion problem, and provide a linear 
approximation. We then in section 3 present the simulated 
lensing data used to test the method, before we go on to 
present the adaptive gridification of the lens-plane, section 
4. In section 5 we describe the various inversion algorithms 
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and finally, in section 6 we introduce the use of the comple- 
mentary (null) space. 



2 THE PROBLEM FORMULATED IN ITS 
BASIC LINEAR FORM 

The fundamental problem in lens modelling is the following: 
Given the positions of lensed images, 6, what are the po- 
sitions of the corresponding background galaxies 3 and the 
mass distribution of the lens, M{6). Mathematically this en- 
tails inverting the lens equation 



f} = e-a{e,M{e)) 



(1) 



where d{d) is the deflection angle created by the lens which 
depends on the observed positions, 9. Prom now on we will 
omit the vector notation unless otherwise noted. The data 
points of our problem, 9, arc given by the x and y positions 
of each of the pixels forming the observed arcs. 

The deflection angle, a, at the position 9, is found by 
integrating the contributions from the whole mass distribu- 
tion. 

4G Du 



a{9) 



DsDi 



M{9') 



rd9' 



(2) 



where Dj^, Di, and are the angular distances from the 
lens to the source galaxy, the distance from the observer to 
the lens and the distance from the observer to the source 
galaxy respectively. In equation 2 wo have made the usual 
thin lens approximation so the mass M{9') is the projected 
mass along the line of sight 9' . Due to the (non- linear) de- 
pendency of the deflection angle, a on the position in the 
sky, 9, this problem is usually regarded as a typical exam- 
ple of a non-linear problem. We will see that this is only 
partially true. 

The next approximation we make is to split the lens 
plane in Nc small regions (hereafter cells) over which the 
projected mass is more or less constant. We can then rewrite 
equation (2) as; 

AG Du V- 9-9i 
'^(^) = ^D:A 2.-^^3^ (3) 

The first point we want to make here is that the deflection 

angle a may be thought of as the net contribution of many 
small masses rrn in the positions 9i, each one pulling the 
deflection in the direction of {9 — 9i) and with a magnitude 
which is proportional io rrn/ (9 — 9i). If we divide the lens 
plane in a grid with Nc cells, the masses rm can be consid- 
ered as the mass contained in the cell i (i = 1, ...,Nc)- If the 
cells are sufficiently small then the above pixellization of the 
mass plane will give a good approximation to the real mass 
distribution. 

Our second point is that the problem is non-linear in one 

direction but linear in the other. That is, given a position 
in the sky 9 (and given a lens) there is only one 13 which 
satisfy equation 1 but given a position of the background 
galaxy (3, there may be more than one position in the sky 
{9) satisfying equation 1 or equivalently, the source galaxy 
may appear lonscd in more than one position in the sky. The 
linear nature of the problem is evident when one realizes 
that the only non-linear variable, namely the 9 positions, 
are fixed by the observation and that the problem depends 




Figure 1. Original simulated mass profile. The total projected 

mass is 1.119 X 10"*^^ h"-*^ Mq in the field of view (0.1 degrees 
across) and the cluster is at z = 0.18. The units of the grey scale 
map arc 10"'^''/t^"'MQ/pixcl where each pixel is 0.494 arcscc'^. Also 
shown are the radial (small) and tangential (large) critical curves 
for a source at redshift z = 3. 



linearly on the unknowns (/3 positions and masses, rrii, in 
the cells). 

Let us now assume that we have a data set consist- 
ing of a system of radial and tangential strong lensing arcs 
which are spread over Ng pixels in the image. We will also 
assume that we know which arcs originate from the same 
sources. Since both the data and the mass distribution has 
been discretized we can rewrite equation (1) as a simple 
matrix equation: 



6>-TM 



(4) 



where 9 and /3 arc now 2Ne element vectors containing x and 
y values of the observed positions and the (unknown) source 
positions respectively. M is the mass vector containing all 
Nc mass cells, and T is the {2Ng x N^) matrix casting the 
mass vector into a vector of displacement angles. A more 
technical account of the make-up of the T matrix can be 
found in the appendix. Equation 4 clearly demonstrates 
the linear nature of the problem when formulated in this 
manner. The problem has now been reduced to a sot of 2Ng 
linear equations with 2Ne -\- Nc unknowns (lens masses and 
source galaxy positions). Notice that when the problem is 
formulated in this form, there are more unknowns than equa- 
tions which means it is an underdetermined system with an 
infinite number of solutions. In order to identify a suitable 
solution for such a system, we need to add extra information 
or impose constraints. 

One way of doing this is by reducing the number of un- 
knowns, for instance by removing the source positions from 
the unknown category. This can be achieved by minimizing 
the dispersion in the source plane, i.e. demanding that the 
pixels in the source plane be as concentrated as possible for 
each source. In this case we are left with only Nc unknowns. 
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Figure 2. The 13 sources at redshift z € (1,6). The field of 

view is 0.1 degrees across and is centered in the same point as 
figure 1. The solid line marcs the position of the caustics at 2: = 3 
correponding to the radial (dotted line) and tangential (solid line) 
critical curves. 

Another way of constraining the system is by assum- 
ing the Ns sources are well approximated by point sources, 
which reduces the number of unknowns to Nc + 2Ns . This 

means effectively demanding that all observed 6s for arcs 
corresponding to the same source, can be traced back 
through the lens to Ns single points. With this assumption 
we can rewrite the lens equation in the compact form of 

= rx. (5) 

r is now a matrix of dimension 2Ng x (Nc + 2Ns) and X 
is the vector of dimension (Nc + 2Ns) containing all the 
unknowns in our problem (see appendix A), the mass ele- 
ments and the 2Ns central (x- and y) coordinates of the Ns 
sources. Now the system is matematically overdetermined 
(more data points that unknowns) and has a unique point 
source solution. This unique solution can be found by nu- 
merical methods as we will see later. 

The linearization of the problem means that it is in 
principle solvable by both matrix inversion and simple lin- 
ear programming routines. In practice, the problem quickly 
becomes ill-conditioned and too large for direct matrix in- 
version, and approximate numerical methods are more suit- 
able. The main problem with the linearization is that we do 
not know if the obtained linearized solution creates artifi- 
cial tangential or radial axes. Checking this requires forward 
solving of the lens equation which is non-linear due to the 
complicated dependence of the deflection angle on 9. 

We suggest a novel approach to this problem, by using 
all the available information in the images, i.e. the informa- 
tion inherent in the dark areas (pixels containing no arcs) 
as well as the observed arcs. By pixellization the dark areas, 
tracing these pixels back through the lens and imposing that 
they fall outside the sources it is possible to find the true 
solution without over-predicting arcs. This use of the null 
space is to our knowledge unprecedented, and in principle 



allows for a complete, linear solution to a problem usually 
considered non-linear. 



3 SIMULATIONS 

Before proceeding to invert the system of linear equations 
(5) , it is instructive to take a closer look at the simulations 
which are going to be used to test the inversion algorithms. 
Given that most methods rely on the luminous mass dis- 
tribution, this is particularly important should the galaxies 
not accurately trace the dark matter. 

Our simulations consist of three elements. The first el- 
ement of the simulation is the projected mass distribution 
(M) in the lens plane. We simulate a generic mass distribu- 
tion of a cluster with a total projected mass of 1.119 x 10^^ 
h"'^ M0 in the field of view located at redshift z = 0.18. The 
field of view of our simulation is 0.1 degrees The mass profile 
is built from a superposition of 20 NFW profiles with added 
elhpticity. The halos' masses vary from 0.25 x 10^^ h"'^ M0 
to 2 X 10^^ h~^ M0. In the context of lensing, NFW halos 
seem to reproduce well the shear profile of meissive clus- 
ters up to several Mpc (Dahle et al. 2003, Kneib et al. 2003, 
Broadhurst et al. 2005). The final mass distribution is shown 
in figure 1. Also in the same figure we show the two critical 
curves. The interior one is the radial critical curve and the 
exterior one the tangential critical curve. Both curves have 
been calculated assuming the source is at redshift 2 = 3. 
The tangential critical curve is usually associated with the 
Einstein radius. Its large size (radius « 2 arcmin) is due to 
the unusually high projected mass of our cluster plus its rel- 
atively low redshift {z = 0.18). Although the deduced Ein- 
stein radius is larger than the one observed in clusters, the 
simulation presented here will serve the purpose of testing 
the different methods discussed here. More realistic simula- 
tions with proper mass distribution and source density will 
be presented in a future paper. 

The second element in our simulation are the sources 
(/?), for which we extract 13 sources from the HUDF (Hubble 
Ultra Deep Field, Beckwith et al. 2003). We assign them a 
redshift {z £ [1,6]) and size and place them in different 
positions behind the cluster plane. 

The third element are the lensed images {9) which are 
calculated from the first two elements (M and (3). This is 
done through a simple ray-tracing procedure. For each posi- 
tion 9 in the image, we calculate the deflection angle, a, and 
then the corresponding source plane position,/^, according to 
the lens equation (equation 1). If the calculated /3 coincides 
with one of the original sources, we assign to the lensed im- 
age the value (colour) of that source. Otherwise that point 
in the lensed image is left dark (value ). We repeat the op- 
eration for all the pixels (^s) to produce a complete image. 
Also, since the sources are small compared with our pixel 
size and to avoid missing some sources, we oversample our 
pixels by subdividing them and checking each pixel at dif- 
ferent locations (6* -|- with A9 < 1 and 9 £ [1, 512]) The 
original sources are plotted in figure 2 and the corresponding 
98 in the left panel of figure 3. 
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Figure 3. The sources lensed by the mass distribution of figure 1. The right image is the lensed image using 325 cells in the dynamical 
grid (see text). The left image is the exact solution using no griding. Note the differences in the radial arcs. 



4 GRIDIFYING THE MASS DISTRIBUTION 
4.1 The T matrix 

As laid out in section 2 the basis of our non-paramctric 
reconstruction method is the assumption that the real mass 
can be well approximated by a pixelized mass distribution. 
The T matrix is then the matrix which casts the mass vector 
into the vector of deflection angles 

ai = TijMj (6) 

For convenience rather than taking the mass distribu- 
tion in each cell to be constant we assume it follows a Gaus- 
sian distribution centred in the centre of the cell with some 

dispersion. This allows us to calculate analytically the lens- 
ing contribution from each mass cell, saving valuable com- 
puter time. We use a dispersion of 2a where a is the size of 
the cell, and we have confirmed with simulations that the 
lensed imago generated with this choice agrees well with the 
true lensed image using a relatively small number of cells. 
The detailed structure of the T matrix will be explained in 
appendix A. 



4.2 Multi resolution mass-grid 

Rather than taking a uniform grid, it is better to construct 
a dynamical- or multi-resolution grid. By sampling dense re- 
gions more heavily, it is possible to reduce drastically the 
number of cells needed to accurately reproduce the lens- 
ing properties of the cluster. In other words we choose an 
adaptive grid which samples the dense cluster centre better 
than the outer regions. Since we do not actually know the 
density profile of the cluster, this multi-resolution grid must 
be obtained through an iterative procedure. An example of 



this process can be found on the SLAP* (Strong Lensing 
Analysis Package) webpage. 

Given a mass estimate (a first mass estimate can be ob- 
tained with a coarse regular grid), we split a given cell into 
four sub-cells if the mass in the cell exceeds some threshold 
value. The lower this threshold, the higher the number of 
divisions and consequently the higher the final number of 
cells. The obtained grid can then be used for the next mass 
estimate, and the process can be repeated as necessary. Typ- 
ically the mass estimate will improve with each iterative step 
as this dynamical grid allows for the relevant regions of the 
cluster to become resolved. 

In figure 4 we show an example of a gridded version of 
the true mass in figure 1 for a threshold of Mthr = 8.0 x lO'^^ 
h~^ Mq. This grid has 325 cells. The corresponding (true) 
mass in the grid is shown in figure 5. The parameter Mthr 
or equivalently the number of cells, Nc, can be viewed as a 
free parameter but also as a prior. Fixing it means we are 
fixing the minimum scale or mass we are sensitive too. In a 
paralell paper (Diego et al. 2004) we explored the role of Nc 
and found that it can have a negative effect on the results 
if it takes very large values. A natural upper limit for Nc is 
2 times the number of pixels in the 6 vector minus 2 times 
the number of sources (the factor 2 accounts for the x and 
y positions). Our simulations show that a good election for 
Nc is normally 1/4 of this upper limit. 

For the shake of clarity it is useful to give some practical 
numbers. The image on the left of figure 3 has 512^ pixels 
from which 2156 pixels are part of one of the « 35 arcs so 
Ne = 2156. These arcs are coming from 13 sources {Ng = 13) 
and the lens plane is tipically divided in a few hundred cells 
{Nc ~ 300). 

* see http://darwin.cfa.haJvard.edu/SLAP/ 
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Figure 4. Dynamical grid for tlio mass in figure 1. This case 
corresponds to Mthr = 8.0 X 10^^ M©. 



5 INVERSION METHODS 

In this section we will describe some inversion methods 
which can be applied to solve the problem. Most of these 
methods can be found in popular books like Numerical 
Recipes (Press et al. 1997). All these algorithms are be- 
ing implemented in the package SLAP* which will be made 
available soon. 

Once we have the problem formulated in its linear form 
with all the unknowns on one side it is tempting to try a 
direct inversion of equation 5. Although the F matrix is not 
square, one can find its inverse, F"^, by decomposing F into 
orthogonal matrices. This is similar to finding the eigenval- 
ues and eigenvectors of F. This approximation has its ad- 
vantages as well as its drawbacks and we will explore this 
possibility later. However, we anticipate that degeneracies 
between neighboring pixels in the arcs as well as neighboring 
cells in the lens plane (not to mention the compact nature 
of the sources) will result in a system of linear equations 
which is not well behaved. The rank of the matrix F will 
be normally smaller than its smaller dimension. Calculating 
the inverse in this situation is not a trivial task. 

A second approach is rotating our system of linear equa- 
tions using a transformation which is just F"^. This trans- 
forms F into a square, symmetric and positive definite ma- 
trix of dimension {2Ns + Nc) x {2Ns +Nc), A = F'^F which 
is better behaved than the original F matrix. However, the 
rank of A is in generally smaller than its dimension and its 
inverse does not exist. The hope in this case must therefore 
be to find an approximate rather than exact solution to the 
system. 

The third approach is the simplest and will be explored 
first. We assume we know nothing about the sources other 
than their redshift and that they are much smaller than the 
strong lensing arcs. This is the same as saying that the lens 
has to be such that it focuses the arcs into compact sources 
at the desired (known) redshift. This simple argument alone 
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Figure 5. Masses in the cells of the dynamical grid of figure 4. 

will turn out to be powerful enough to get a quick but good 
first estimate of the mass distribution in the lens plane. 
We explore all these approaches below in reverse order. 

5.1 A first approach: Minimizing dispersion in the 
source plane 

In this subsection we will discuss the simplest (although ef- 
fective) method to get a fast estimation of the mass using 
no prior information on the lens nor the sources. The prob- 
lem then contains two sets of unknowns; The mass vector we 
want to determine, M, and the {3 positions of the sources. In 
general, for a finite number of observed arcs, there are sev- 
eral combinations of 13 and M which can reproduce the ob- 
servations. The most obvious unphysical solution is the null 
solution, where the mass is zero and the sources identical to 
the observed arcs. The easiest way to avoid such unphysical 
solutions is to minimize the variance of the {3 positions. This 
is equivalent to imposing that the vector M really acts as a 
true lens: We require big arcs with large magnifications and 
multiple images separated by arcsec or even arc-min to focus 
into a rather compact region in the source plane. This min- 
imization process assumes that we are able to associate the 
multiple lensed images with a particular source. This can be 
achieved with either spectroscopy or with morphology and 
the multicolor imaging of the arcs. 

To minimize the variance in the source plane it is illus- 
trative to follow the steepest descent path although other 
more effective minimization algorithms can be used (see be- 
low). Given an initial guess for the mass vector, one can 
calculate the derivative of the variance in the source plane 
as a function of the mass and minimize in the direction of 
the derivative. Once a minimum is found, we calculate the 
derivative in the new mass position and minimize again in 
an iterative procedure. 

The quantity to be minimized is : 

/(M) = ^a.^ (7) 
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Figure 6. Smooth version of the recovered mass after minimizing 

the variance. The total recovered mass is 1.01 X 10^^ h-i M0. 
Compare this mass with the original one in figure 1. 

where the sum is over the number of identified sources and 
ai is the variance of the source s in the source plane. That 
is; 

a^, /3 >? (8) 

where the f3's are calculated from equation 4 and the aver- 
age is over the /3's corresponding to source s. By combining 
equations 4 and 8 is easy to compute the derivative of CTg 
with respect to M. 

1^ = 2 < /? >,< T,- >, -2 < >, (9) 

where Tj is the column j of the T matrix and the average is 
made only on the elements associated with the source s. We 
should note that all equations involving the vectors or 
Q = TA4 have two components, x and y so there will be in 
fact two equations like equation 9. One for the x component 
of /? and the other one for the y component. At the end, 
the quantity we aim to minimize is = + a^. As we 
already mentioned, the minimization can be done following 
the path of steepest descent given by equation 9. This path 
will end in a minimum at the new mass . The process 
can be repeated by evaluating the new path at the new mass 
position until the variance is smaller than certain e. A good 
choice for e is to take a few times the expected variance for 
a population of Ns galaxies at the measured Ns redshifts. 
specific values for e will be discussed later. In practical 
terms the minimization is done through a series of iterations, 
gradually improving the dynamical grid as laid out in section 
4.2. For each iteration the ability of the mass distribution 
to focus the /3s into compact sources is improved. 

Equation 8 can be improved by weighting it with the 
amplification of the different images, assuming the errors 
in the imago plane are similar from images to images. This 
point will not be explored here. The minimization of the 
variance is a powerful and robust method for finding a first 
guess for the mass vector without making assumptions about 



Figure 7. Recovered sources after minimizing the variance. The 
real positions of the sources arc shown as crosses. Note that the 
recovered sources are farther away than the real ones. This com- 
pensates for the fact that the recovered mass is lower than the 
the true value. 

the sources. In fact, since the source positions can be esti- 
mated from equation 4, the minimization of the variance also 
provides us with an initial guess for these. The drawback is 
the slow convergence of the algorithm. A typical minimiza- 
tion may take several hours on a 1 GHz processor. In the 
next sub-section we will go a step further and we will include 
the 3 positions in the minimization as well as speed up the 
convergence by orders of magnitude. 

5.2 Biconjugate Gradient 

Inversion of linear systems where the matrix dimensions are 
of order 10'^, is a numerically trivial problem for today's 
computers provided the matrix is well behaved. If the ma- 
trix has null or negative eigenvalues, a direct inversion is not 
feasible and one has to aim to solve for some approximated 
solution. Our system of linear equations is a good example 
of an ill-conditioned one. Direct inversion of the matrix is 
not possible due to negative eigenvalues. However there is 
another important reason why we do not want to solve ex- 
actly (or invert) the system of equations. An exact solution 
means that we will recover a mass distribution which puts 
the arcs into delta function sources. As we will see later, this 
solution will be unphysical. Instead, we axe interested in an 
approximated solution which does not solve exactly the sys- 
tem of equations and which has a residual. This residual will 
have the physical meaning of the extension of the sources or 
the difference between the point-like sources and the real, 
extended ones. The biconjugate gradient (Press et al. 1997) 
will be a useful way to regularize our problem. 

The biconjugate gradient algorithm is one of the fastest 
and most powerful algorithms to solve for systems of linear 
equations. It is also extremely useful for finding approximate 
solutions for systems where no exact solutions exist or where 
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the exact solution is not the one we are interested in. The 
latter will be our case. Given a system of linear equations; 



Ax = b 



(10) 



a solution of this system can be found by minimizing the 
following function, 



f{x) = c — bx + ^x*Ax 



(11) 



where c is a constant. When the function f{x) is minimized, 

its gradient is zero. 



V/(x) = Ax-b = 



(12) 



That is, at the position of the minimum of the function f{x) 
we find the solution of equation (10). In most cases, finding 
the minimum of equation 11 is much easier than finding 
the solution of the system in 10 especially when no exact 
solution exists for 10 or A docs not have an inverse. 

The biconjugate gradient finds the minimum of equa- 
tion 11 (or equivalently, the solution of equation 10) by fol- 
lowing an iterative process which minimizes the function 
f{x) in a series of steps no longer than the dimension of the 
problem. The beauty of the algorithm is that the successive 
minimizations are carried out on a series of orthogonal con- 
jugate directions , pk, with respect to the metric A. That 
is. 



PiApj =0 j <i 



(13) 



This condition is useful when minimizing in a multidimen- 
sional space since it guarantees that successive minimiza- 
tions do not spoil the minimizations in previous steps. 

Let us now turn to the system we want to solve, namely 
equation 5. The biconjugate gradient method assumes that 
the matrix F ( matrix A in equation 10) is square. For our 
case this does not hold since we typically have Ng » {Nc + 
Ns). Instead we build a new quantity, called the square of 
the residual, i?^: 



rxf(e 



FX) 



r'^ex + ix^F^rx) 



(14) 
(15) 



By comparing equations 15 and 11 is easy to identify the 
terms, c = ^0'^9, b = and A = F'^F. Minimizing the 
quantity is equivalent to solving equation 5. To see this 
we only have to realize that 



fo - ^X = F' 



-FX) =F'i? 



(16) 



If an exact solution for equation 5 does not exist, the min- 
imum of R^ will give the better approximated solution to 
the system. The minimum can be now found easily with the 
biconjugate gradient (Press et al. 1997). For the case of sym- 
metric matrices A, the algorithm constructs two sequences 
of vectors and pk and two constants, Ok and f3k ; 



ak = 



T 

rk rk 



PkApk 
Tk+i =rk- OikApk 



rlrk 



Pk+l = Vk+l + PkPk 



(17) 
(18) 

(19) 
(20) 
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Figure 8. Recovered mass after minimizing using the bicon- 
jugate gradient algorithm. The mass has been smoothed with a 
Gaussian filter. The total recovered mass is 1.17 x 10^^ Mq. 
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Figure 9. Recovered /3's after minimizing R^. Again, crosses rep- 
resent the true position of the sources. The recovered /3o falls in 
the middle of its corresponding cloud /? points. 



At every iteration, an improved estimate of the solution is 
found by; 



Xfc+i = Xk + akPk 



(21) 



The algorithm starts with an initial guess for the solution, 
Xi , and chooses the residual and search direction in the first 
iteration to be; 



ri = pi = b — AXi 



(22) 



Note that pi is nothing but VJ?^. Thus the algorithm 
chooses as a first minimization direction the gradient of the 
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function to be minimized at the position of the first guess. 
Then it minimizes in directions which are conjugate to the 
previous ones until it reaches the minimum or the square of 
the residual is smaller than certain value e. 

The method has one potential pathological behavior 
when applied to our problem. One can not choose e to be 
arbitrarily small. If one chooses a very small e the algorithm 
will try to find a solution which focuses the arcs in A^^ sources 
which are delta functions. This is not surprising as we are 
assuming that all the 2Ng unknown (3s are reduced to just 
2A^s /3s, i.e the point source solution (see figures 10 and 11 
) . The mass distribution which accomplishes this is usu- 
ally very much biased compared to the right one. It shows a 
lot of substructure and it has largo fiuctuations in the lens 
plane. One therefore has to choose e with a wise criteria. 
Since the algorithm will stop when < e we should choose 
e to be an estimate of the expected dispersion of the sources 
at the specified rodshifts. This is the only prior which has to 
be given to the method. However, we will see later how the 
specific value of c is not very critical as long as it is within 
a factor of a few of the right source dispersion (see figure 
18 below). Instead of defining e in terms of R^ is better 
to define it in terms of the residual of the conjugate gradi- 
ent algorithm, r^. This will speed the minimization process 
significantly since we do not need to calculate the real dis- 
persion at each step but to use the already estimated rk- 
Both residuals are connected by the relation, 

Vk = r^R (23) 

Imposing a prior on the sizes of the sources means that we 
expect the residual of the lens equation, R, to take typical 
values of the order of the expected dispersion of the sources 
at the measured redshifts. Hence we can define an Rprior of 
the form; 

•Rprior = 0-i * RND (24) 

where the index i runs from 1 to Ng and Ui is the disper- 
sion (prior) assumed for the source associated to pixel i and 
RND is a random number uniformly distributed over -1 and 
1. Then, we can estimate e as; 

6 — T}^ Viz — ^py.^oy.rr Rprior (^^) 

When calculating e is is better to consider only the first Nc 
columns of F and discard the last 2Ns . This is recommended 
to avoid that the Is in this part of the F matrix do not cancel 
out when multiplied by the random number and dominate 
the much smaller Fij elements corresponding to the mass 
components. The last Nc columns of F should give con- 
tribution when multiplied by the random Rprior vector. If 
we chose as a prior that the sources are Gaussians with a 
a = 30h~^ kpc located at the measured redshift, this ren- 
ders e « 2 X 10"^^°. The reader will note that the chosen a is 
a few times larger than the one we would assign to a typical 
galaxy. We will discuss this point later. The final result is 
shown in figures 8 and 9. 

One has to be careful in not choosing the cTj very small. 
In fact, they should be larger than the real dispersion of the 
source. Only when the number of grid points, Nc, is large 
enough, can the gridded version of the true mass focus the 
arcs into sources which are similar in size to the real ones. If 
Nc is not large enough, the gridded version of the true mass 



focuses the arcs into sources which are larger than the real 
sources. We should take this into account when fixing ai. 

The reader can argue that a more clever way of includ- 
ing this prior information in the algorithm is by perturbing 
the /3 elements in equation 5 (or similarly, equation 5 in Ap- 
pendix A). This is done by adding some noise to the Is in the 
two 1 matrices in equation 37 (see Appendix A) . One could 
for instance add Gaussian noise with a dispersion similar 
to the expected dispersion of the source at redshift z. The 
reality however is that the quadratic nature of R^ cancels 
out any symmetric perturbation added to the elements of 
F. Thus, the result is similar if we perturb F or not and we 
still have to include the prior and fix e to be large enough 
so we do not recover the point source solution. This also 
tell us that this method is not very promising if one wants 
to include parity information in the recovery of the mass 
and sources. In the next subsection we will show a different 
approach which can include this parity information. 

5.3 Singular Value Decomposition 

The Singular Value Decomposition (hereafter SVD) algo- 
rithm allows for decomposition of a generic m x n (with 

m >= n) matrix A into the product of 3 matrices, two or- 
thogonal and one diagonal (e.g Press et al. 1997). 

A = UWV'^ (26) 

where ?7 is an m x n orthogonal matrix, is an n x n diag- 
onal matrix whose elements in the diagonal arc the singular 
values and is the transpose of an n x n orthogonal ma- 
trix. When A is symmetric, the SVD reduces to finding the 
eigenvectors and eigenvalues of A. 

The advantage of this decomposition is that the inverse 
of A is given by ; 

A-^ = VW~^U^ (27) 

where is another diagonal matrix whose elements are 

just the inverse of the elements of W, that is W~j^ = 
1.0/Wjj. The proof A~^A = I follows from the property 
that U and V'^ are orthogonal (U'^U = VV'^ = 7). 

In our case, we can use SVD to calculate the inverse of 
the F matrix and find the solution, X, directly by inverting 
equation (5) 

X = T-^e = vw'^u'^e (28) 

Although the SVD allows us to invert the problem by calcu- 
lating F~^, its full power lies in its ability to solve a system 
approximately. The level of approximation can be controlled 
by setting a threshold in the matrix W^^. In our problem, 
there will be many equations which are strongly correlated 
in the sense that most of the 9 positions in a single arc will 
come from the same source (that is, they will have almost 
the same 0). Also we have to keep in mind that we arc us- 
ing all the pixels in our data. This means that two equations 
corresponding to two neighboring pixels will look almost ex- 
actly the same. When one computes the SVD of the matrix 
F, these two facts translate into a matrix W with elements 
in the diagonal which arc or close to 0. The inverse of W 
would be dominated by these small numbers and the solu- 
tion will look very noisy. The good news about using SVD is 
that the most relevant information in the F matrix is packed 
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Figure 10. Recovered mass after minimizing using the bicon- 

jugatc gradient algorithm and with a very small e {point source 
solution). The total recovered mass is 2.43 x 10^^ h"^ Mq but 
there are also regions with negative masses. 
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Figure 11. Recovered /3's after minimizing for the point 
source solution. The real source positions are shown as crosses. 



into the first few values (the largest values in W) while the 
small single values in W will contain little information or 
the redundant information coming from neighboring pixels. 
One can just approximate W by another matrix W' where 
all the elements in its diagonal smaller than certain thresh- 
old are set to 0. Also, in the inverse of W', these elements 
are set to 0. The magic of this trick is that the solution 
found with this approximation will contain the main trend 
or main components of the mass distribution. 

Another advantage of using the SVD algorithm is that 
in this case no prior on the extension of the sources is needed. 
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Figure 12. Recovered mass after SVD (no prior). The total re- 
covered mass is 1.01 X 10^5 h-i M0. 

The degree of accuracy is controlled by setting the thresh- 
old in the singular values of the matrix W. Those elements 
in W below the threshold arc sot to and the same in its 
inverse. The threshold is usually set after looking at the sin- 
gular values. The first ones will normally stay in some kind 
of plateau and after then the singular values will decrease 
rapidly. The threshold should be normally set immediately 
after the plateau. In figures 12 and 13 wc show the result af- 
ter decomposing F in its SVD decomposition and calculating 
its inverse with (27). 

Like the two previous algorithms, the SVD has its own 
pathologies. Using standard subroutines to find the SVD of 
r usually return a no convergence error. This error comes 
from the nearly degenerate nature of F. One has then to in- 
crease the number of maximum iterations on these subrou- 
tines or use a coarse version of F where only a small fraction 
of the 9 positions (or equivalently, F rows) arc considered. 
Another solution is inverting the preconditioned system of 
equations where we previously multiply for F^. This allows 
to use all the 9 pixels and find the SVD of F^F in a small 
number of iterations. However, using the SVD of F instead 
of F^F has a very interesting feature. It allows to introduce 
parity information in an effective way. 

Contrary to the quadratic cases of minimization of the 
variance or the square of the residual R^, the SVD allows us 
to include parity information in the F matrix which will not 
disappear when wo look for the best solution. Since no r^9 
or F^F operations are involved, parity information will not 
cancel out. SVD could be an interesting way of fine tuning 
the solution by including the extra information coming from 
the parity of the arcs. 

6 INCORPORATING THE NULL SPACE 

So far we have made use of the information contained in the 
observed strong lensing arcs. This gives us a solution which 
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Figure 13. Recovered sources after SVD (no prior). The real 
positions of the sources are shown as crosses 



explains the data in the sense that it predicts arcs in the 
right positions but it could well happen that the solution 
over-predicts arcs. An example of this can bee seen by com- 
paring the reproduced arcs in fig. (16) with the true arcs in 
fig. (3). Some of the reproduced arcs are slightly larger and 
span a larger area than the true ones, although the sources 
are perfectly reproduced in their true positions and the re- 
covered mass is very close to the true mass distribution. 

To avoid this we propose for the first time to include 
information from the null space, 9. That is, the part of the 
image which does not contain any arc. This space tell us 
where an arc should not appear but it does not tell us where 
the hypothetical B should be. The only fair thing we can do 
is to impose that none of the pixels in the null space fall into 
the estimated /? of our solution. By achieving this we will 
have a solution which predicts the right arcs while not over- 
predicting arcs. The solution will be then fully consistent 
with the observed data. 

The null space is connected to the solution X by; 



(3^6- TM 



(29) 



It is evident that wo want the new solution X (M and /3o) to 
be such that the new $ do not fall within a circle of radius 
p{k) centered in each one of the f3o where p{k) is the prior 
with information on how extended are the sources. The null 
space will perturb the solution X in such a way that the 
new solution, X' = X + AX is an approximated solution of 
equation 5 and satisfies all the constraints of the form $ 3 (3. 

The way we incorporate the constraints is by adopt- 
ing an approach commonly used in penalized quadratic pro- 
gramming. We will minimize the new function 



^{x) = R^ + \Y,g{h) 



(30) 



where the A is a constant which guarantees that in the first 
iterations, the second term in equation 30 does not dominate 
the first and gu is a function which will penalize those models 



predicting (3 falling near the (3 positions which minimize R . 
As a penalizing function we will choose an asymptotically 
divergent Gaussian 



9{fk) 



ef'' — n 



with. 



(31) 



(32) 



Where ${k)x is the x component of /3 for the pixel k in the 
null space. /?° is our estimated value of (5 (x component) and 
similarly for the y component. There are as many constraints 
of the form fk as there are pixels, 9, in the null space. 

The parameter ji controls the degree of divergence of 
the Gaussian function. When ;[i = we recover the classi- 
cal Gaussian but as /i approaches 1 the Gaussian becomes 
more and more sharply without increasing its dispersion. For 
11=1 the Gaussian is infinite at /fe = (see figure 14. By 
minimizing the function (j){X) with increasing values of ji we 
will find that in the limit /i — » 1 the solution will push away 
those /3 which originally were falling in the region defined 
by the sot of f). 

Ideally we want to include all the dark or empty pixels 
in the 6 space but this is, in general, found to be a waste 
of memory and computing time. The fact is that only the 9 
pixels which are hitting (or close to hit) one of the Ns esti- 
mated (3 positions of the sources will have some impact on 
the solution. Most of the 9 pixels in the null space already 
satisfy all the constraints for the actual solution X. For this 
reason we will include only those 9 for which the solution X 
predicts their corresponding (3 arc close to hitting (or actu- 
ally hitting) a source. The 9 space will include the observed 
^'s as a subspace. We have to exclude this subspace from /3 
before minimizing equation 30. This is, again, just an opti- 
mization process. In practice, one can include all the pixels 
in the image (excluding those containing part of an arc) in 
the null space. 

After the minimization process is finished, the new so- 
lution will have the arcs falling in compact regions around 
Po while the extra-arcs produced by the previous solution 
will fall in regions outside areas around the /Jo- 

^From our simulation we have seen that addition of the 
null space induces small changes in the mass plane and it 
tends to stabilize the solution in the sense that it makes the 
recovered mass profile independent of the threshold e. This 
is in fact an interesting bonus which comes with the addition 
of the null space. The new function to be minimized, 0, can 
be minimized until the true minimum is found. In this case, 
there is no equivalent of a point source solution. Since some 
of the 9 are in fact very close to the observed 9, the solution 
which minimizes (j) will focus those 9 and 9 in neighboring 
regions in the source plane. When minimizing there will be 
two competing effects. One will tend to increase the mass 
so it minimizes Ft^ (point source solution). The other will 
tend to reduce the mass so the /3 will be pushed away from 
the P positions. The outcome will be a balanced situation 
between the P trying to collapse into compact sources and 
the P trying to escape the wells in the beta positions. 

To accurately quantify the effect more simulations are 
needed. This will be done in a future paper but as an illustra- 
tion we show in figure 15 the recovered mass after imposing 
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Figure 14. Penalizing function showing values of /i = 
0.8, 0.9, 0.95, 0.97, and 0.98 and c = 1. Note that the width of 
the curve does not change when increasing //. 
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Figure 15. Recovered mass (smoothed) after minimizing (f). The 

crosses show the position of the three main halos in the cluster. 
The size of the cross is proportional to the mass of the halo. The 
total recovered mass is 1.1055 X lOi5/i-iM0. 



the constraints in the 9 space. The total mass is now only 
1.2 % lower than the true mass. The new mass also contains 
more structure and staxts showing the internal distribution 
of the main components of the cluster. 

In figure 16 we show the predicted position for the arcs 
after combining the best mass together with the best esti- 
mate for the position of the sources. To compute the arcs 
we have assumed that each source is a circle with a radius 
15ft~^kpc centred in the estimated best source positions and 
at the measured rcdshifts. By comparing figures 16 and fig- 
ure 3 (left) we see that the predicted arcs matches very well 
the observed (simulated) data with the exception of some of 
the arcs near the center of the image. 



1 arcmin ' 




Figure 16. The plot shows the predicted arcs according to the 

best mass and source solution. Wc have assumed that the sources 
arc circles with radius centred in the source position 

and at the measured rcdshifts. This result should be compared 
with the true arcs seen in figure 3. The match is almost perfect. 

7 DISCUSSION 

In this paper we have presented several approaches to non- 
parametric lens modelling. Using no information about lu- 
minosity distributions whatsoever, these methods perform 
remarkably well on simulated strong lensing data. One of 
the main conclusions of this work is simply that it works; It 
is possible to recover information about the mass distribu- 
tion without using prior information on the same if one has 
a sufficiently large number of arcs. This can be seen in fig- 
ure 17 where we show the recovered mass profiles compared 
with the original one. The recovered mass traces the real 
mass distribution up to furthest data point in the simulated 
image. Beyond this point, the recovered profile is insensitive 
to the data. When the minimization stops, the cells in the 
outer regions stay with a mass close to the one they had in 
the first step of the minimization. This point is discussed in 
more detail in Diego et al. (2004). 

We are optimistic about the performance of these meth- 
ods when used on future data, but we would also like to 
emphasize the potential pathologies. We will now discuss 
the major issues. As we have seen in section (2) the inver- 
sion algorithms rely on a linearization of the lens equation. 
We achieve that by decomposing the lens plane into small 
mass-cells and assuming that the gridded mass is a fair rep- 
resentation of the true underlying mass. This is true when 
the number of cells in the grid is large enough, but for a uni- 
form grid this may mean using several thousand mass-cells, 
making the problem ill conditioned and underdetermined. 
By inverting the lens equation in a series of iterations we 
introduce an adaptive grid which optimizes the number of 
cells by sampling dense regions more heavily and using larger 
cells for underdense regions. This allows for good sampling of 
the lens without a huge number of cells. However, the grid- 
ded mass plane is still an approximation to the true mass 



© 0000 RAS, MNRAS 000, 1-16 



Inverting the lens 13 



50 100 150 

9 (orcsec) 

Figure 17. Original profile (solid line) compared with the re- 
covered ones. Dashed line is the biconjugate gradient case, dot- 
dashed the SVD case, dotted the case is when include the null 
space. The minimum of the variance is similar but with less mass 
in the tails. All profiles have been normalized by the critical sur- 
face density for a source at redshift, Zs = 3. 
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Figure 18. Total mass as a function of the threshold e used in 
the biconjugate gradient minimization. The true mass is 1.119 X 
1015/i-1Mq. 



plane so we expect the solution to be an approximation to 
the true solution as well. Since a solution comprises not only 
the masses but the positions and extents of the sources as 
well, this means that we should expect these to also be ap- 
proximations to the true ones. This has already been iden- 
tified as one of the potential pathologies of the algorithm. 
Namely, if we try to focus the sources into very compact 
regions, with sizes comparable to the typical galaxy sizes at 
the relevant redshifts, the obtained mass is in general differ- 
ent to the true mass. In fact the best results arc obtained 
when the mass plane focuses the arcs into regions which are 
a few times larger than the extent of the true sources. As we 
pointed out before, this reconstructed mass corresponds to a 
short-sighted version of the lens. This problem can be over- 
come by requiring the minimization algorithms stop once 
the recovered sources are a few times larger than the true 
sources. The extent of the true sources can be guessed from 
the redshift. 



Introducing such a prior begs the question: How sensi- 
tive is the solution to the specific guess of prior? The answer 
is that it depends on exactly "how bad" that choice is. As 
long as we assume a source size significantly larger than the 
true sizes the actual solution does not change much and re- 
sembles well the true mass distribution. However, when try- 
ing to approach the true source size, the solutions changes 
rapidly away from the realistic model. The situation is shown 
in fig. (18). As shown in section 5.2 a given physical size cor- 
responds to a given threshold, c, and as long as this threshold 
is sufficiently large, > 10"^'^, (corresponding to a physical 
size of > 30h~^ kpc) the total mass is well behaved. If we 
instead demand that the physical size of the reconstructed 
sources be more realistic, say ^ 12/1^^^ kpc, we get a thresh- 



old of 



10" 



for which the mass distribution is already 



starting to diverge away from the true mass. We therefore 
want the recovered sources to be larger than the true ones. 
In other words we want to recover a short-sighted cluster! In 
previous sections we have demonstrated non-parametric lens 
modelling using several different algorithms with promising 
results. However we have not yet addressed the question of 
uniqueness. Is there a unique solution, and if not, how dif- 
ferent arc the possible solutions? 

The good news is that minimizing quadratic functions 
like the variance or the square of the residual guarantees that 
there is only one absolute minimum. This absolute minimum 
corresponds to the point source solution. The bad news is 
that this is not the solution we arc looking for. As shown 
above, trying to focus the sources too much introduces arti- 
facts in the mass distribution, so we need to stop the mini- 
mization at some step before the absolute minimum. In two 
dimensions it is easy to visualize that the quadratic func- 
tion will have the shape of a valley, and stopping the min- 
imization at some point before the actual minimum means 
choosing an ellipse on which all solutions are equally good. 
In many dimensions this is harder to visualize but we expect 
our obtainable solutions to lie on an N-dimensional ellipsoid 
around the minimum. 

To get a quantitative grasp on how much these solutions 
differ in mass and source positions, we solve the equations 
for a range of random initial conditions. This is a manage- 
able task since our minimization algorithm is extremely fast, 
taking only about ~ 1 second on a IGHz processor. Also for 
speed purposes, we fix the grid to the one corresponding to 
the solution shown in figure 8. Fixing the grid speeds the 
process significantly since the F^^r matrix and the F^d vec- 
tor do not need to be recalculated in each minimization. 
In Diego et al. (2004) a similar process is followed carrying 
multiple minimizations but this time the grid is changed dy- 
namically based on the solution found in the previous step. 
The authors found on that paper that the dispersion of the 
solutions increases due to the extra variability of the grid. 

The result is shown in figure 19. The minimization pro- 
cess described in section 5.2 will stop at a different point in 
the N-dimensional ellipsoid for each set of different initial 
conditions. If the total mass of the initial mass distribution 
is very low the minimization will stop at solutions with mass 
below the true mass and /3 positions further away from the 
center of the cluster than the real ones (dots). If instead 
we start with a total mass much larger than the true value, 
the minimization process returns higher masses and /? po- 
sitions closer to the center of the potential (crosses). The 
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situation improves when we impose that the total mass of 
the initial distribution have a reasonable value. In this case, 
the minimization stops in a region close to both the right 
mass and the right /3s. This argument motivates an iterative 
minimization where successive estimations for the mass are 
obtained at each step and used in the next. 

Among the three algorithms presented in this work, the 
minimization of the variance is the less powerfull due to its 
low convergence. It is however interesting from a pedagog- 
ical point of view since it shows the existing degeneracies 
between the total mass of the cluster and the positions of 
the sources. The biconjugate gradient is orders of magnitude 
faster and is capable of finding the point source solution in a 
few seconds. The point source solution is however unphysical 
and a prior associated to the size of the sources is needed in 
order to stop the minimization process at the proper place. 
The good news are that the algorithm shows a weak sen- 
sitivity to this prior provided it is chosen with a minimum 
of wisdom. Since the point where the minimization stops 
depends on where the minimization starts, this also allows 
to study the range of possible models consistent with the 
data by minimizing many times while changing the initial 
conditions. This feature makes the biconjugate gradient very 
attractive for studying the space of solutions. This space can 
be reduced by including the extrarinformation contained in 
the null space, that is, the areas in the sky with no observed 
arcs. We have seen how this information can be naturally in- 
cluded in the minimization process by introducing a penalty 
function. 

Finally, the SVD has the interesting feature that it al- 
lows to add extra information regarding the parity or re- 
solved features in the arcs. This possibility has not been 
studied in detail in this paper but is definitely worth ex- 
ploring since it would allow to recover smaller details in the 
mass distribution. The main drawbacks of the SVD is that 
the decomposition fails to converge when all the informa- 
tion is used and one has to use a coarse version of the data 
instead. The second drawback is that some of the singular 
values in the matrix W are very small. These values will 
dominate the inverse if they are not masked by a threshold 
in the matrix W . Choosing the value for this threshold is 
not very critical as long as its amplitude is large enough to 
mask the small singular values in W. 

The accuracy of the methods presented in this paper al- 
low for high precision non-parametric mass reconstructions 
and direct mapping of the dark matter distribution of clus- 
ters. Previous works have suggested some discrepancy be- 
tween mass estimates derived from different data (Wu & 
Fang 1997). Accuracy, combined with the speed of the algo- 
rithm opens the door to cosniological studies. Strong lens- 
ing analyses were predicted to yield interesting cosmologi- 
cal constraints, but due to the uncertainties in our under- 
standing of galaxy clusters they have yet to live up to these 
predictions. A fast, per-ccnt level determination of cluster 
masses from lensing observations, could allow for sufficient 
statistical sampling to provide information about cosmolog- 
ical parameters. An interesting follow up to this work would 
be to establish what number of strong lensing systems, and 
what quality of data is necessary in order to make interesting 
constraints using our methods. 

However the methods presented here should not be 
applied indiscriminately. For instance, if the reconstructed 
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Figure 19. Dispersion in tlic solutions for three different sets 
of initial conditions. Upper left (dots), we start with a random 
realization of small masses. Asterisks in the center, starting from 
a random realization but with a total mass of M 1.06. Crosses 
(bottom right), starting with high random masses. In all cases, the 
threshold was set to e = 2 X 10^^" and the starting f3 positions 
where chosen as random in a box of 100 x 100 centered in the 
center of the image. 

mass is systematically biased toward recovering less cuspy 
central regions this reduces the possibility of making con- 
clusions about mechanisms for dark matter annihilation 
(Spcrgcl & Steinhardt 2000, Wyithe et al. 2001, Boehm et 
al. 2004). If one is interested in using our methods to discuss 
this, that bias must be quantified. Due to lens resolution is- 
sues we anticipate that our reconstruction algorithms indeed 
have a bias in this direction and it is likely that other algo- 
rithms may suffer from similar problems. These and other 
poetential systematic errors shall be investigated in future 
works. 

7.1 Future work 

This paper is not intended to be an exhaustive exploration of 
the accuracy of non-parametric mass reconstruction meth- 
ods but rather to help re-ignite debate and competition in 
the field, and demonstrate the feasibility and power of such 
methods in the face of new data. Specific issues such as mag- 
nitudes of the mass profile bias, source morphologies, sur- 
face brightness of the arcs, parity, projection effects as well 
as other relevant issues discussed above should be explored 
in order to improve the results. Special emphasis should be 
paid to the effects the grididification of the lens plane has on 
the results. An attempt to study this issue has been made on 
a second paper (Diego et al 2004) where the authors have 
shown that the grid may play an important role. In this 
paper we have presented the results using a very specific 
simulation. Although the algorithm has been tested with 
different sinmlations with positive results, it would be de- 
sirable to make a more detailed study of how the result is 
affected by issues like the structure of the lens, its symme- 
try or the number of sources and accuracy in their redshifts. 
Such a study would be even more interesting if a compara- 
tive study between parametric and non-parametric methods 
is done using the same simulations. Also interesting is to 
explore the effects of adding constraints in the mass of the 
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type Mi > which may help to stabihze the solution even 
in the absence of a prior on the source sizes. This can be 
accomplished by adopting the techniques used in quadratic 
programming. All theses issues, although very interesting, 
arc beyond the scope of this paper and will be studied in 
subsequent papers. 

Although not discussed in this paper, a very interesting 
piece of work would be about the potentiality of an accu- 
rate strong lonsing analysis as a cosmological tool (c.g Link 
& Pierce 1998, Yamamoto ct al. 2001, Golsc ct al. 2002, 
Soucail et al. 2004). Previous works (Yamamoto et al. 2001, 
Chiba & Takahashi 2002, Sereno 2002, Dalai et al. 2004) 
suggest that the Icnsing obscrvablcs arc primarily dependent 
on the lens model while the dependency in the cosmologi- 
cal parameters is minor. Constraining the lens model with 
accuracy can open the window to do cosmology with strong 
lensing images. Is easy to imagine that a single image of a 
cluster with dozens of arcs coming from sources at differ- 
ent redshifts will constrain the lens model rather accurately 
and then, it will have something to say about the cosmology 
given the large number of distances involved in the analysis. 
All these issues axe of great interest and they are intended 
to be studied in subsequent papers. 
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APPENDIX A: THE T AND T MATRICIES 

The T matrix contains the information of how each 
mass element j affects the i*^ deflection angle. That 
is 

ai = TijMj. (33) 

Precisely how the matrix is constructed is a matter of 
convenience. Our specific choice is to put both x and y 
components of all the arc pixels into vectors with 2Nff 
elements. The resulting T is then a 2N0 x Nceiis matrix. 



© 0000 RAS, MNRAS 000, 1-16 



16 Diego et al. 



In fact the structure of the T matrix and the vec- 
tors are irrelevant as long as they combine to correctly 
represent the lens equation as 

(3i=9,-T,,M,. (34) 
Each element in the T matrix is computed as fol- 



lows. 

Tx(i,j) = A[l 
where 



4G Dis 
c2 DiDs 



(35) 



(36) 



The index i runs over the observed 9 pixels, in- 
dex j runs over the Nc elements in the mass vector, 

M. The factor in equation (35) is just the differ- 
ence (in radians) between the x position in the arc (or 
x of pixel 9i) and the x position of the cell j in the 
mass grid {6x — 9x{i) — 9'^{j)). Similarly we can define 

Sy = 9y{i) - 9'y{j) and S = y^fT^. Also, for wc 

only have to change 6^ by 6y. Since we include the fac- 
tor IO-^^Mq in A (see equation 36), the mass vector M 
in equation 4 will be given in 10^^ h~^ Mq units. The 
dependency comes from the fact that in A we have 
the ratio Dig/ (DiDs) which goes as h. Also wc calcu- 
late Tx and Tj, separately, the final T matrix entering 
in equation 4 contains both, T^, and Ty (the same holds 
for the vectors /3 and 9). One can rearrange the x and 
y components in any order. 

The structure of the F matrix is identical to the 
matrix T but with the difference that it has 2Ns ad- 
ditional columns (the location of the extra-columns is 
irrelevant as long as it is consistent with the location of 
the 2Ns Po unknowns in the X vector). Is easy to sec 
that each one of these extra columns (with dimension 
2 * Ne) corresponds to one of the Ng sources. Since the 
r matrix has to contain both the x and y component, 
the first/second half of each one of the extra columns 
will be all O's depending on if it corresponds to the y/x 
component of Bg. The other half will be full of O's and 
I's, the I's being in the positions associated with that 
particular source, the O's elsewhere. That is, the lens 
equation can be written explicitly as (d denotes matrix 
and a denotes vector); 



(37) 



Where again, 6^ and Oy are two Ne dimensional vectors 
containing the x and y positions, respectively, of the 
pixels in the observed arcs. The two [Ng x Nc matrices 
Ta, and Ty contain the x and y lensing effect of the cell 
J on the 9 pixel i. The N0 x Ng dimensional matrices 
1 and C) are full of O's (the matrix) or contain I's 
(the 1 matrix) in the i positions (i S [l.Ne]) where 
the ith 9 pixel comes from the j source (j e [l,A^s]) 

and elsewhere. The vector M contains the Nc griddcd 
masses we want to estimate. (5^ contains the central x 
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positions of the N^ sources. Similarly, /3o will contain 
the central y positions. 

This paper has been produced using the Royal Astronomical 
Society /Blackwell Science M^jX style file. 
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